AITopics | Shaoguan

Collaborating Authors

Shaoguan

Towards Efficient CoT Distillation: Self-Guided Rationale Selector for Better Performance with Fewer Rationales

Yan, Jianzhi, Liu, Le, Pan, Youcheng, Chen, Shiwei, Xiang, Yang, Tang, Buzhou

arXiv.org Artificial IntelligenceSep-30-2025

Chain-of-thought (CoT) distillation aims to enhance small language models' (SLMs) reasoning by transferring multi-step reasoning capability from the larger teacher models. However, existing work underestimates rationale quality, focusing primarily on data quantity, which may transfer noisy or incorrect information to the student model. To address the above issues, we proposed \textbf{M}odel-\textbf{O}riented \textbf{R}ationale \textbf{S}election \textbf{D}istillation (MoRSD), which can discern and select high quality rationales for distillation to improve performance further. We further propose a Rationale Difficulty (RD) metric to measure the ability of the student model to generate the correct answer under a given rationale. Compared to the baseline, we achieved 4.6$\%$ average improvement on seven datasets over three tasks, using fewer rationales by controlling their accuracy, diversity, and difficulty. Our results reveal that a small portion of the high quality rationales can enhance the reasoning ability of student models than the entire dataset. Our method promises to be a possible solution for efficient CoT distillation. Our code will be released in https://github.com/Leon221220/MoRSD.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.23574

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
North America > Canada > Ontario > Toronto (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

From Long to Lean: Performance-aware and Adaptive Chain-of-Thought Compression via Multi-round Refinement

Yan, Jianzhi, Liu, Le, Pan, Youcheng, Chen, Shiwei, Yuan, Zike, Xiang, Yang, Tang, Buzhou

arXiv.org Artificial IntelligenceSep-29-2025

Chain-of-Thought (CoT) reasoning improves performance on complex tasks but introduces significant inference latency due to verbosity. We propose Multiround Adaptive Chain-of-Thought Compression (MACC), a framework that leverages the token elasticity phenomenon--where overly small token budgets can paradoxically increase output length--to progressively compress CoTs via multiround refinement. This adaptive strategy allows MACC to determine the optimal compression depth for each input. Our method achieves an average accuracy improvement of 5.6 percent over state-of-the-art baselines, while also reducing CoT length by an average of 47 tokens and significantly lowering latency. Furthermore, we show that test-time performance--accuracy and token length--can be reliably predicted using interpretable features like perplexity and compression rate on the training set. Evaluated across different models, our method enables efficient model selection and forecasting without repeated fine-tuning, demonstrating that CoT compression is both effective and predictable. Our code will be released in https://github.com/Leon221220/MACC.

large language model, machine learning, qwen2, (21 more...)

arXiv.org Artificial Intelligence

2509.22144

Country:

Europe > Austria > Vienna (0.14)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.56)

Add feedback

Predicting Human Mobility in Disasters via LLM-Enhanced Cross-City Learning

Tang, Yinzhou, Wang, Huandong, Fan, Xiaochen, Li, Yong

arXiv.org Artificial IntelligenceJul-29-2025

--The vulnerability of cities to natural disasters has increased with urbanization and climate change, making it more important to predict human mobility in the disaster scenarios for downstream tasks including location-based early disaster warning and pre-allocating rescue resources, etc. However, existing human mobility prediction models are mainly designed for normal scenarios, and fail to adapt to disaster scenarios due to the shift of human mobility patterns under disaster . T o address this issue, we introduce DisasterMobLLM, a mobility prediction framework for disaster scenarios that can be integrated into existing deep mobility prediction methods by leveraging LLMs to model the mobility intention and transferring the common knowledge of how different disasters affect mobility intentions between cities. This framework utilizes a RAG-Enhanced Intention Predictor to forecast the next intention, refines it with an LLM-based Intention Refiner, and then maps the intention to an exact location using an Intention-Modulated Location Predictor . Extensive experiments illustrate that DisasterMobLLM can achieve a 32.8% improvement in terms of Acc@1 and a 35.0% ITH the rapid urbanization [1] and climate change [2], cities across the world are becoming increasingly vulnerable to natural disasters (e.g., heavy rains), exposing more human lives and properties at risk. To tackle these challenges, a fundamental research problem is to predict human mobility during disaster scenarios, which can support a wide spectrum of downstream emergency response tasks including location-based early disaster warning [3]-[5], pre-allocating rescue resources [6], and planning humanitarian relief [7], etc. As a classic machine learning problem, human mobility prediction has been studied for decades; however, most existing work [8], [9] has focused on normal scenarios rather than disaster scenarios. As illustrated in Figure 1(a) and (b), we employ two representative algorithms trained in the normal scenario, i.e., DeepMove [8] and Flashback [10], to predict human mobility in normal scenarios and disaster scenarios, respectively. Their performance in disaster scenarios significantly decreases compared with normal scenarios, with an average relative performance gap of 46.4% and 24.5% in terms of accuracy and mean reciprocal rank, respectively.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.19737

Country:

Asia > China > Guangdong Province > Zhuhai (0.04)
Asia > China > Beijing > Beijing (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
Asia > China > Guangdong Province > Shaoguan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Manboformer: Learning Gaussian Representations via Spatial-temporal Attention Mechanism

Zhao, Ziyue, Qi, Qining, Ma, Jianfa

arXiv.org Artificial IntelligenceMar-6-2025

Compared with voxel-based grid prediction, in the field of 3D semantic occupation prediction for autonomous driving, GaussianFormer proposed using 3D Gaussian to describe scenes with sparse 3D semantic Gaussian based on objects is another scheme with lower memory requirements. Each 3D Gaussian function represents a flexible region of interest and its semantic features, which are iteratively refined by the attention mechanism. In the experiment, it is found that the Gaussian function required by this method is larger than the query resolution of the original dense grid network, resulting in impaired performance. Therefore, we consider optimizing GaussianFormer by using unused temporal information. We learn the Spatial-Temporal Self-attention Mechanism from the previous grid-given occupation network and improve it to GaussianFormer. The experiment was conducted with the NuScenes dataset, and the experiment is currently underway.

gaussianformer, information, representation, (14 more...)

arXiv.org Artificial Intelligence

2503.04863

Country:

Asia > Singapore (0.04)
Asia > China > Guangdong Province > Shaoguan (0.04)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (0.35)
Information Technology (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

FinSphere: A Conversational Stock Analysis Agent Equipped with Quantitative Tools based on Real-Time Database

Han, Shijie, Zhou, Changhai, Shen, Yiqing, Sun, Tianning, Zhou, Yuhua, Wang, Xiaoxia, Yang, Zhixiao, Zhang, Jingshu, Li, Hongguang

arXiv.org Artificial IntelligenceJan-8-2025

Current financial Large Language Models (LLMs) struggle with two critical limitations: a lack of depth in stock analysis, which impedes their ability to generate professional-grade insights, and the absence of objective evaluation metrics to assess the quality of stock analysis reports. To address these challenges, this paper introduces FinSphere, a conversational stock analysis agent, along with three major contributions: (1) Stocksis, a dataset curated by industry experts to enhance LLMs' stock analysis capabilities, (2) AnalyScore, a systematic evaluation framework for assessing stock analysis quality, and (3) FinSphere, an AI agent that can generate high-quality stock analysis reports in response to user queries. Experiments demonstrate that FinSphere achieves superior performance compared to both general and domain-specific LLMs, as well as existing agent-based systems, even when they are enhanced with real-time data access and few-shot guidance. The integrated framework, which combines real-time data feeds, quantitative tools, and an instruction-tuned LLM, yields substantial improvements in both analytical quality and practical applicability for real-world stock analysis.

analyscore, finsphere, language model, (12 more...)

arXiv.org Artificial Intelligence

2501.12399

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > China > Guangdong Province > Shaoguan (0.04)
Asia > China > Zhejiang Province > Ningbo (0.04)
(6 more...)

Genre:

Research Report (0.82)
Personal > Interview (0.68)

Industry:

Energy (1.00)
Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Artificial Intelligence in Landscape Architecture: A Survey

Xing, Yue, Gan, Wensheng, Chen, Qidi

arXiv.org Artificial IntelligenceAug-26-2024

The development history of landscape architecture (LA) reflects the human pursuit of environmental beautification and ecological balance. With the advancement of artificial intelligence (AI) technologies that simulate and extend human intelligence, immense opportunities have been provided for LA, offering scientific and technological support throughout the entire workflow. In this article, we comprehensively review the applications of AI technology in the field of LA. First, we introduce the many potential benefits that AI brings to the design, planning, and management aspects of LA. Secondly, we discuss how AI can assist the LA field in solving its current development problems, including urbanization, environmental degradation and ecological decline, irrational planning, insufficient management and maintenance, and lack of public participation. Furthermore, we summarize the key technologies and practical cases of applying AI in the LA domain, from design assistance to intelligent management, all of which provide innovative solutions for the planning, design, and maintenance of LA. Finally, we look ahead to the problems and opportunities in LA, emphasizing the need to combine human expertise and judgment for rational decision-making. This article provides both theoretical and practical guidance for LA designers, researchers, and technology developers. The successful integration of AI technology into LA holds great promise for enhancing the field's capabilities and achieving more sustainable, efficient, and user-friendly outcomes.

application, artificial intelligence, construction, (13 more...)

arXiv.org Artificial Intelligence

2408.147

Country:

Oceania > Australia > South Australia (0.04)
North America > United States > New York (0.04)
Asia > China > Guangdong Province > Shaoguan (0.04)
(7 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Food & Agriculture > Agriculture (1.00)
Energy (1.00)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Data Science > Data Mining (1.00)
(10 more...)

Add feedback

GeoTransformer: Enhancing Urban Forecasting with Geospatial Attention Mechanisms

Jia, Yuhao, Wu, Zile, Yi, Shengao, Sun, Yifei

arXiv.org Artificial IntelligenceAug-16-2024

Recent advancements have focused on encoding urban spatial information into high-dimensional spaces, with notable efforts dedicated to integrating sociodemographic data and satellite imagery. These efforts have established foundational models in this field. However, the effective utilization of these spatial representations for urban forecasting applications remains under-explored. To address this gap, we introduce GeoTransformer, a novel structure that synergizes the Transformer architecture with geospatial statistics prior. GeoTransformer employs an innovative geospatial attention mechanism to incorporate extensive urban information and spatial dependencies into a unified predictive model. Specifically, we compute geospatial weighted attention scores between the target region and surrounding regions and leverage the integrated urban information for predictions. Extensive experiments on GDP and ride-share demand prediction tasks demonstrate that GeoTransformer significantly outperforms existing baseline models, showcasing its potential to enhance urban forecasting tasks.

geotransformer, prediction, representation, (17 more...)

arXiv.org Artificial Intelligence

2408.08852

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (0.93)
Banking & Finance > Economy (0.69)
Transportation > Passenger (0.68)
(2 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Qiao, Runqi, Tan, Qiuna, Dong, Guanting, Wu, Minhui, Sun, Chong, Song, Xiaoshuai, GongQue, Zhuoma, Lei, Shanglin, Wei, Zhe, Zhang, Miaoxuan, Qiao, Runfeng, Zhang, Yifan, Zong, Xiao, Xu, Yida, Diao, Muxi, Bao, Zhimin, Li, Chen, Zhang, Honggang

arXiv.org Artificial IntelligenceJul-1-2024

Visual mathematical reasoning, as a fundamental visual reasoning ability, has received widespread attention from the Large Multimodal Models (LMMs) community. Existing benchmarks, such as MathVista and MathVerse, focus more on the result-oriented performance but neglect the underlying principles in knowledge acquisition and generalization. Inspired by human-like mathematical reasoning, we introduce WE-MATH, the first benchmark specifically designed to explore the problem-solving principles beyond end-to-end performance. We meticulously collect and categorize 6.5K visual math problems, spanning 67 hierarchical knowledge concepts and five layers of knowledge granularity. We decompose composite problems into sub-problems according to the required knowledge concepts and introduce a novel four-dimensional metric, namely Insufficient Knowledge (IK), Inadequate Generalization (IG), Complete Mastery (CM), and Rote Memorization (RM), to hierarchically assess inherent issues in LMMs' reasoning process. With WE-MATH, we conduct a thorough evaluation of existing LMMs in visual mathematical reasoning and reveal a negative correlation between solving steps and problem-specific performance. We confirm the IK issue of LMMs can be effectively improved via knowledge augmentation strategies. More notably, the primary challenge of GPT-4o has significantly transitioned from IK to IG, establishing it as the first LMM advancing towards the knowledge generalization stage. In contrast, other LMMs exhibit a marked inclination towards Rote Memorization - they correctly solve composite problems involving multiple knowledge concepts yet fail to answer sub-problems. We anticipate that WE-MATH will open new pathways for advancements in visual mathematical reasoning for LMMs. The WE-MATH data and evaluation code are available at https://github.com/We-Math/We-Math.

junior high school examination paper, knowledge concept, primary school, (12 more...)

arXiv.org Artificial Intelligence

2407.01284

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Asia > China > Beijing > Beijing (0.10)
(18 more...)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > K-12 Education (0.73)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

Delayed Bottlenecking: Alleviating Forgetting in Pre-trained Graph Neural Networks

Zhao, Zhe, Wang, Pengkun, Wang, Xu, Wen, Haibin, Xie, Xiaolong, Zhou, Zhengyang, Zhang, Qingfu, Wang, Yang

arXiv.org Artificial IntelligenceApr-23-2024

Pre-training GNNs to extract transferable knowledge and apply it to downstream tasks has become the de facto standard of graph representation learning. Recent works focused on designing self-supervised pre-training tasks to extract useful and universal transferable knowledge from large-scale unlabeled data. However, they have to face an inevitable question: traditional pre-training strategies that aim at extracting useful information about pre-training tasks, may not extract all useful information about the downstream task. In this paper, we reexamine the pre-training process within traditional pre-training and fine-tuning frameworks from the perspective of Information Bottleneck (IB) and confirm that the forgetting phenomenon in pre-training phase may cause detrimental effects on downstream tasks. Therefore, we propose a novel \underline{D}elayed \underline{B}ottlenecking \underline{P}re-training (DBP) framework which maintains as much as possible mutual information between latent representations and training data during pre-training phase by suppressing the compression operation and delays the compression operation to fine-tuning phase to make sure the compression can be guided with labeled fine-tuning data and downstream tasks. To achieve this, we design two information control objectives that can be directly optimized and further integrate them into the actual model design. Extensive experiments on both chemistry and biology domains demonstrate the effectiveness of DBP.

downstream task, information, representation, (14 more...)

arXiv.org Artificial Intelligence

2404.14941

Country:

Asia > China > Hong Kong (0.05)
Asia > China > Jiangxi Province > Nanchang (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

pg-Causality: Identifying Spatiotemporal Causal Pathways for Air Pollutants with Urban Big Data

Zhu, Julie Yixuan, Zhang, Chao, Zhang, Huichu, Zhi, Shi, Li, Victor O. K., Han, Jiawei, Zheng, Yu

arXiv.org Artificial IntelligenceNov-9-2017

Many countries are suffering from severe air pollution. Understanding how different air pollutants accumulate and propagate is critical to making relevant public policies. In this paper, we use urban big data (air quality data and meteorological data) to identify the \emph{spatiotemporal (ST) causal pathways} for air pollutants. This problem is challenging because: (1) there are numerous noisy and low-pollution periods in the raw air quality data, which may lead to unreliable causality analysis, (2) for large-scale data in the ST space, the computational complexity of constructing a causal structure is very high, and (3) the \emph{ST causal pathways} are complex due to the interactions of multiple pollutants and the influence of environmental factors. Therefore, we present \emph{p-Causality}, a novel pattern-aided causality analysis approach that combines the strengths of \emph{pattern mining} and \emph{Bayesian learning} to efficiently and faithfully identify the \emph{ST causal pathways}. First, \emph{Pattern mining} helps suppress the noise by capturing frequent evolving patterns (FEPs) of each monitoring sensor, and greatly reduce the complexity by selecting the pattern-matched sensors as "causers". Then, \emph{Bayesian learning} carefully encodes the local and ST causal relations with a Gaussian Bayesian network (GBN)-based graphical model, which also integrates environmental influences to minimize biases in the final results. We evaluate our approach with three real-world data sets containing 982 air quality sensors, in three regions of China from 01-Jun-2013 to 19-Dec-2015. Results show that our approach outperforms the traditional causal structure learning methods in time efficiency, inference accuracy and interpretability.

causal pathway, machine learning, pattern recognition, (19 more...)

arXiv.org Artificial Intelligence

1610.07045

Country:

Asia > China > Beijing > Beijing (0.08)
Asia > China > Shanghai > Shanghai (0.05)
Asia > China > Guangdong Province > Shenzhen (0.05)
(23 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine (0.67)
Law (0.48)
Government (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback